-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fetch Migration] Enable migration for identical, empty target cluster index #390
[Fetch Migration] Enable migration for identical, empty target cluster index #390
Conversation
Codecov Report
@@ Coverage Diff @@
## main #390 +/- ##
============================================
+ Coverage 63.55% 63.64% +0.09%
- Complexity 715 718 +3
============================================
Files 82 82
Lines 3298 3298
Branches 303 303
============================================
+ Hits 2096 2099 +3
+ Misses 1014 1011 -3
Partials 188 188
Flags with carried forward coverage won't be shown. Click here to find out more. |
d968a25
to
29b6d8f
Compare
resp = requests.get(actual_endpoint, auth=endpoint.auth, verify=endpoint.verify_ssl) | ||
def doc_count(indices: set, endpoint: EndpointInfo) -> IndexDocCount: | ||
actual_endpoint = endpoint.url + ','.join(indices) + __SEARCH_COUNT_PATH | ||
resp = requests.get(actual_endpoint, auth=endpoint.auth, verify=endpoint.verify_ssl, json=__SEARCH_COUNT_PAYLOAD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be wrapped in an exception block printing errors or warnings when there is a connection problem. Additionally, status codes should be checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got an open issue tracking such fixes across the implementation
deployment/cdk/opensearch-service-migration/dp_pipeline_template.yaml
Outdated
Show resolved
Hide resolved
sink: | ||
- opensearch: | ||
bulk_size: 10 | ||
# DO NOT CHANGE the hosts value - this will be updated by the CDK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comment above: I would add a note that they would need to change these only if running locally (in Docker)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the most recent updates to the Fetch Migration CDK, the hosts
value in the sink
will be replaced by an inline target host value through the CDK, so changing this value will not affect that logic nor reflect in the final setup. I've updated the comment here to:
# Note - CDK code will replace this value with the target cluster endpoint
This is in contrast to the hosts
value in source
where the string value matters because the CDK performs a string substitution to inject the source cluster endpoint.
c9fc513
to
fd9edd2
Compare
This is encapsulated in an IndexDocCount class, which also track the total doc count Signed-off-by: Kartik Ganesh <[email protected]>
…cluster indices This change includes refactoring the index difference logic to an IndexDiff class, and an additional network call to the target cluster to fetch document counts for identical indices to determine which ones are empty. Signed-off-by: Kartik Ganesh <[email protected]>
This change adds documentation to the dp_pipeline_template.yaml file and changes some reasonable defaults. It also bumps the Fetch Migration task definition to 1 CPu and 4 GB memory. Signed-off-by: Kartik Ganesh <[email protected]>
Signed-off-by: Kartik Ganesh <[email protected]>
Signed-off-by: Kartik Ganesh <[email protected]>
Signed-off-by: Kartik Ganesh <[email protected]>
fd9edd2
to
54bead7
Compare
Signed-off-by: Kartik Ganesh <[email protected]>
Description
This change enables Fetch Migration to migrate data for indices that are identical between the source and target cluster, and the document count on the target cluster is zero. Such behavior allow users to execute Fetch Migration in an idempotent manner without having to manually remove empty indices on the target cluster (that were created by a previous run of Fetch Migration that did not complete successfully)
Prior to this change, all indices that were identical between the source and target clusters were ineligible for data migration. Now, we check the doc_count for identical indices on the target cluster and only exclude identical indices with a non-zero document count.
Testing
Also tested end-to-end via the CDK. Log output from a test run:
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.